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ABSTRACT 

This paper presents a fast, economical particle- multiple- mesh TV-body code optimized 
for large-TV modelling of collisionless dynamical processes, such as black-hole wan- 
dering or bar-halo interactions, occurring within isolated galaxies. The code has been 
specially designed to conserve linear momentum. Despite this, it also has variable soft- 
ening and an efficient block-timestep scheme: the force between any pair of particles is 
calculated using the finest mesh that encloses them both (respecting Newton's third 
law) and is updated only on the longest timestep of the two (which conserves momen- 
tum). For realistic galaxy models with TV > 10 6 , it is faster than the fastest comparable 
momentum-conserving tree code by a factor ranging from ~ 2 (using single timesteps) 
to ~ 10 (multiple timesteps in a concentrated galaxy). 
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1 INTRODUCTION 

Newton's third law is a central pillar of physics. Much of 
what we know about the dynamical evolution of galaxies 
comes from TV-body simulation, but most TV-body codes 
use approximations that break the third law. A well-known 
example of the consequences of breaking it is provided by 
the sinking satellite problem (Hernquist & Weinberg 1989; 
Weinberg 1989; Velazquez & White 1999); the dynamical 
friction felt by the satellite is grossly overestimated if one 
"pins" the centre of the host galaxy, ignoring the galaxy's 
I = 1 dipole response. This example is perhaps extreme, but 
there are many other situations where one is interested in 
the detailed response of a galaxy to asymmetric perturba- 
tions and would like to be able to model it without having 
to worry about artifacts arising from violations of Newton's 
third law. Examples include modelling bar-halo interactions 
(see McMillan & Dehnen (2005) and references therein) and 
the wandering of central supermassive black holes. 

This paper describes the TV-body code GROMMET 
(GRavity On Multiple Meshes Economically and Transpar- 
ently), which has been designed specifically to model the 
detailed dynamical evolution of individual galaxies without 
using any approximations that violate Newton's third law. 
I assume that the galaxy is collisionless. It is completely 
described by a distribution function (DF) f(x,v;t), which 
gives the (mass) density of particles in phase space, along 
with the potential &(x;t) generated by this DF and any 
external sources. The evolution of / is governed by the col- 
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lisionless Boltzmann equation (CBE), 



df 



v/- 



dv 



(1) 



where the accelerations a = — d&/dx. As Hernquist & Os- 
triker (1992) and Leeuwin, Combes & Binney (1993) empha- 
sise, in a collisionless A-body code particles are not to be 
thought of as representing stars or groups of stars. Instead 
one is using the method of characteristics to integrate (1), 
estimating the accelerations a(x) by Monte Carlo sampling. 
Of course, the shot noise in these estimates means that in 
practice any simulation will never be perfectly collisionless. 
Therefore it is important to make TV as large as possible in 
order to minimize the effects of this noise. So, GROMMET has 
been designed to be both fast and economical on memory. 

In section 2 below I describe the multiple-mesh proce- 
dure used by GROMMET to estimate accelerations. Section 3 
shows how this leads naturally to a momentum-conserving 
block-timestep integrator based on Duncan, Levison & Lee's 
(1998) potential-splitting scheme. In section 4 I present the 
results of some tests and also compare GROMMET's perfor- 
mance against other codes'. Section 5 sums up. For com- 
pleteness, I include in an Appendix an explanation of James' 
(1977) method, which is used in Section 2. 



2 POTENTIAL SOLVER 

The task of the potential solver in a collisionless TV-body 
code is to estimate the accelerations 



a(x) 
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where one does not know the density distribution p(x) ex- 
plicitly, but instead only has a discrete sample of N particles 
with positions Xi and masses rm drawn from it. 

2.1 Particle- mesh method 

At the heart of GROMMET's potential solver is the particle 
mesh (PM) method (Hockney & Eastwood 1988). It uses a 
cubical mesh, with vertices at positions spaced a dis- 

tance h apart. The procedure for obtaining an initial esti- 
mate of the accelerations (eq. 2) felt by each particle follows. 

(i) Loop over all N particles using cloud-in-cell inter- 
polation to build up the discretized density distribution 

Pijk 

(ii) Calculate the potential §ijk corresponding to this pijk 
using James' (1977) method (see Appendix); 

(iii) Looping again over all N particles, use a finite- 
difference approximation to estimate accelerations -d^/dx 
at the mesh points surrounding each particle, then interpo- 
late the value of the acceleration at the particle's location 
using the same cloud-in-cell scheme employed in step (i) . 

Since steps (i) and (iii) use the same interpolation scheme, 
this procedure produces accelerations that obey Newton's 
third law subject to one extra condition: the finite-difference 
scheme in step (iii) cannot provide meaningful accelera- 
tions for particles that lie in the outermost layer of mesh 
cells, which means that those particles should be omitted in 
step (i). This seems an almost trivial point, but it is impor- 
tant for the refinement scheme introduced below. It turns 
out that for the scheme below to work properly we have to 
peel off the outer two layers of cells. I typically use meshes 
with 64 3 or 128 3 cells, of which then only 60 3 or 124 3 are 
assignable in step (i). 

Apart from respecting Newton's third law, the other 
attractive features of the PM method are its efficiency and 
its linear scaling with N: the time needed to carry out step 
(ii) is independent of N, but for a typical mesh with 64 cells 
the overall time is dominated by the O(N) cost of carrying 
out the assignment steps (i) and (iii) once N > 5 x 10 5 ; 
similarly, the memory needed to store mesh quantities and to 
carry out James' method is negligible compared to that used 
for storing the particles' masses, positions and accelerations. 

The major disadvantage of the PM method is that it 
does not work well for centrally concentrated mass distri- 
butions, since each particle has an effective size of order 
the mesh spacing h. In other words, the mesh spacing sets 
the effective softening length used in the calculation of the 
forces. 



2.2 Refinement scheme 

The natural remedy of this shortcoming is to introduce finer 
submeshes in interesting, higher- density regions and to re- 
calculate the accelerations for particles inside each submesh. 
But how best to include the effect of the parent mesh's grav- 
ity field on the accelerations calculated in each submesh and 
vice versa? One possibility is to solve Poisson's equation on 
the submesh subject to boundary conditions interpolated 
from the parent mesh (e.g., Anninos, Norman & Clarke 1994; 
Jessop, Duncan & Chau 1994). This is a key element of 
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Figure 1. An example of the multiple mesh scheme used to cal- 
culate accelerations. Particles A, B and C all lie within the region 
covered by the outer, coarse mesh, but B and C also lie inside 
the fine, inner mesh. An initial estimate of the forces on all three 
particles comes from using the PM method on the coarse mesh. 
This is refined by isolating those particles within the inner mesh, 
recalculating their interparticle forces first using the fine mesh, 
then using the coarse, and adding the difference to the initial 
coarse-mesh estimate. Therefore, the force between A and each 
of B and C is obtained using the coarse mesh, but that between 
B and C comes from the fine mesh. In all cases Newton's third 
law is respected. 

the widely- used family of multigrid methods (e.g,. Kravtsov, 
Klypin & Khokhlov 1997; Knebe, Green & Binney 2001), 
and would be straightforward to apply in GROMMET using 
the method of equivalent charges (see Appendix). However, 
all of these schemes violate Newton's third law, as one can 
easily see by considering the force between a particle inside 
a submesh and another one outside. 

GROMMET instead uses a simplified version of the 
scheme proposed by Gelato, Chernoff & Wasserman (1997) 
(see also figure I). The acceleration felt by each particle is 
calculated using a series of nested "boxes". We start with 
the outermost toplevel box, which discretizes the simula- 
tion volume into, say, n x x n y x n z = 60 3 assignable cells. 
This box, like any other box, can contain zero, one or more 
subboxes. Each subbox contains two meshes: a coarse one 
composed of an (n x /2) x (n y /2) x (n z /2) subblock of the 
parent's cells, and a fine one that covers the same subblock 
twice as finely in each direction, with n x x n y x n z cells. 

For the most common situation in which each box con- 
tains no more than one subbox, the acceleration at any po- 
sition x is given by the sum over all boxes, 

a(x) =^2aj(x), (3) 

3 

where the contribution from the j th box, 

aj(x) = a+(x) - aj(x), (4) 

is the difference between accelerations calculated using the 
PM method on the box's fine (+) and coarse (-) meshes, 
simply ignoring any particles that lie outside. The outermost 
toplevel box (j — 0) has no coarse mesh, so = 0. In 
this scheme the acceleration between any two particles is 
calculated using the box with the finest mesh spacing that 
encloses them both and Newton's third law is obeyed to 
machine precision. This last feature comes at a cost though: 
the acceleration (3) is discontinuous at box boundaries, a 
point to which I return below. 
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Sometimes one might want to refine a region that can- 
not be enclosed within a single subbox. If one simply tiles 
the region using, say, two abutting subboxes, the force be- 
tween particles located at either side of the boundary be- 
tween them will be calculated using the coarse parent mesh, 
which is usually not desirable. The solution is to let the sub- 
boxes overlap by a few mesh cells and then correct eq. (3) for 
the double counting of particles in the overlap region by in- 
troducing a third subbox whose boundaries are given by the 
intersection of the two overlapping subboxes and subtract- 
ing the accelerations (4) obtained in this new subbox. In 
contrast, Gelato, Chernoff & Wasserman (1997) introduce 
a buffer zone around each box and treat particles in the 
buffer zone differently from the rest. Their scheme violates 
Newton's third law. 

I have deliberately omitted any automated scheme for 
deciding where and when to introduce subboxes; these 
schemes inevitably break time-reversibility, and, for the type 
of problem the code was designed for, I expect that the user 
will already have a much better idea of how best to place 
boxes. 



3 MOVING PARTICLES 

The characteristic equation of the CBE is 

dt _ dx _ dv . . 

1 v a 

where the accelerations a(x,t) depend on the DF / through 
cqu. (2). The most straightforward and widely used way of 
following the characteristics is by using a leapfrog integra- 
tor. The (fixed-timestep) leapfrog produces an approximate 
solution to (5) that respects many of its important sym- 
metries; it is symplectic 1 , reversible in time and, when the 
accelerations are obtained using a potential solver that re- 
spects Newton's third law, it conserves linear momentum. 

An unattractive feature of the leapfrog is that it uses 
the same fixed timestep for all particles. Consider a deeply 
plunging radial orbit in a model galaxy with a central den- 
sity cusp or black hole. Integrating this orbit accurately near 
pericentre requires a very small timestep, which, in the stan- 
dard leapfrog scheme, means that all other particles have 
to be integrated using the same small timestep, even those 
on loosely bound circular orbits. This can be prohibitively 
expensive, since it involves calculating the full set of accel- 
erations a(x,t) for all particles at every timestep. 

GROMMET uses a block-timestep scheme to improve effi- 
ciency. Each of the boxes of section 2 above has an associated 
timestep, which can be chosen to be either equal to that of 
its parent box or a factor of two shorter. Broadly speak- 
ing, a particle's position and velocity are updated using the 
shortest timestep of any of the boxes enclosing it, but the 
force between any pair of particles is updated only on the 
timestep of the longest particle, thus conserving linear mo- 
mentum. The rest of this section makes this somewhat vague 
description more precise. 



1 This assumes that the accelerations are smooth, which is not 
the case for many collisionless JV-body codes, including GROMMET. 



3.1 The standard leapfrog integrator 

Recall that a leapfrog integrator with a single, fixed timestep 
t corresponds to motion in a time-dependent Hamiltonian 
(e.g., Wisdom & Holman 1991) 

H = T+ jr 5 e (k-^\v{x 1 ,...,x N ), (6) 

where T = i £\ mivl is the kinetic energy of all the par- 
ticles and 5c(x) = ±(5(x - e) + S(x + e)) with < e « 1. 
The periodic comb of delta functions turns on the potential 
energy V(xi, . . . , xn) only at times t = (k ± e)r for inte- 
ger k. Integrating the resulting equations of motion from 
time t — kr to t — (k + l)r yields 

Vi(k + \)=Vi{k) + \Tai{k), (7) 
Xi(k + l) = Xi(k)+TVi(k+±), (8) 
Vi(k + l)=Vi(k+±) + ±Tai(k + l), (9) 

where the accelerations a 4 (fc) = —dV/nndxi evaluated at 
time t — kr. This is just the sequence of steps for the kick- 
drift-kick form of the leapfrog: the potential is turned on 
briefly just after t — kr resulting in a "kick" (denoted K) 
to the particles' velocities; the particles then "drift" (D) 
along at their new velocities until the potential turns on 
again just before t = (k + l)r, at which point they receive 
another kick. The drift-kick-drift form of the leapfrog can be 
obtained by adding | to the argument of the delta functions 
or, alternatively, by integrating the equations of motion from 
(k — |)r to (k + |)r instead. 

Another way of looking at each of these versions of 
the leapfrog is to consider them as compositions of the 
two time-asymmetric first-order symplectic integrators (each 
applied left to right), K(r/2)D(r/2) and D(t/2)K(t/2), 
whose first-order error terms cancel (e.g., Saha & Tremaine 
1992). In the following I write the leapfrogs as the sequence 
of operations KDDK and DKKD, dropping the (r/2) ar- 
guments. 

3.2 A block-timestep leapfrog 

In GROMMET the accelerations a(x) are given by a sum (3) 
of contributions (4) from boxes with different spatial refine- 
ment levels. The outermost box is associated with a timestep 
to and timestep level I = 0. Each subbox has a timestep 
Ti — 2~ 1 to with timestep level I either equal to that of its 
parent or larger by one. Let us add together all the contri- 
butions (4) to a(x) from boxes having timestep level I and 
write the result as a^^x). Let V(i){x) be the corresponding 
contribution to the potential energy. Instead of turning on 
the full potential V = V(;) at every timestep, consider 
the alternative Hamiltonian 

H = T+f^ E S e (k-^ nr )v {l) (x u ...,x N ), (10) 

1 = k=-oo ^ °' 

where / max is the maximum timestep refinement level and 
each Vjj) is turned on only at times t = 2 _i fcr . This is a 
variant of the potential splitting used by Duncan, Levison & 
Lee (1998) to model close encounters in planetary systems. 

Integrating the equations of motion for this new Hamil- 
tonian results in a nested sequence of KDDK leapfrog 
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Level 0: K D D K 

Level 1 : K DD KK DD K 

Level 2: KDDK KDDK KDDK KDDK 

Figure 2. The sequence of steps for motion in the Hamiltonian (10) with two levels of timestep refinement. For any given timestcp 
level I, the K operation "kicks" particles inside any boxes having that timestep level, applying to each an impulse I77 ■ (—dVm/dx), 
where the timestep t; = 2~ 1 tq. These impulses change the particles' velocities, but not their positions. They conserve the particles' total 
linear momentum. The D operation "drifts" all particles for a time i-r;, changing their positions but not their velocities. 



The sequence can be pro- 
simple recursive algorithm: 



steps, as shown in figure 2. 
duced using the following 
StepG, r): 
if I > I 

max • 

Drift (t/2) 
else : 

Kick(/,r/2) 
StepG + 1,t/2) 
StepCZ + 1,t/2) 
Kick(;,r/2) 

This algorithm is called initially with I = and r = To. 
Each Kick(/,r/2) operation applies an impulse — irVVj;) 
to all particles, which changes the particles' velocities but 
not their positions. The Drift operation moves the particles 
once the complete set of impulses has been applied. 

This algorithm requires a factor ~ / max fewer kick op- 
erations (and therefore fewer expensive force evaluations) 
than a simple leapfrog with a single timestep 2~ imax T ( ). It is 
obvious that it conserves linear momentum and is reversible 
in time. Unlike the integrator in Duncan, Levison & Lee 
(1998), however, it is not symplectic; the discontinuities in 
the accelerations at box boundaries mean that the Poincare 
integral invariants are not conserved. 



4 TESTS AND COMPARISONS 

I have carried out a number of simple tests with small num- 
bers of particles (1 < N < 20) to confirm that my implemen- 
tation of the ideas above really does respect Newton's third 
law and conserve linear momentum. These small-iV tests 
serve only as minimal sanity checks; as stressed by Knebe, 
Green & Binney (2001), truly interesting tests of a collision- 
less code come not from testing how faithfully it reproduces 
the solution to the two-body problem, but rather from its 
ability to model collisionless systems accurately using large 
numbers of particles. 

In this section I use some simple collisionless galaxy 
models to test GROMMET's potential solver and integrator, 
comparing results obtained from GROMMET against those 
obtained from two other codes. Both of the other codes 
are available as part of the NEMO package. The first is 
the fast tree code described in Dehnen (2002). It obtains 
accelerations from a Cartesian multipole expansion. This 
respects Newton's third law and a standard leapfrog in- 
tegrator built around this potential solver then naturally 
conserves linear momentum. (A multiple-timestep version is 
also available, but it does not conserve momentum.) The 
second code (Hernquist & Ostriker 1992) uses the so-called 
"self-consistent field" (SCF) method, which represents the 
density and potential using a truncated basis function ex- 




Figure 3. Fractional errors in the accelerations at randomly se- 
lected positions within and around an N = 10 7 -particle realiza- 
tion of a truncated power-law sphere. The lower set of points plot 
results calculated using the potential solver of section 2 using 8 
levels of refinement of a 60 mesh with x max = 2. The upper set 
(offset by 0.1 vertically) are for results obtained using a tree code 
with fixed softening length e = 10~ 2 . 
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Figure 4. Fractional errors in the accelerations inside a 10 - 
particle realization of a power-law sphere, a factor of 10 more 
particles than in figure 3. The lower set of points plot results 
obtained using the potential solver of section 2 with the same 
set of nested boxes and 60 3 mesh employed for figure 3. The 
middle and upper set show the effects of using finer meshes with 
124 3 (middle) and 252 3 (upper) cells, offset by 0.04 and 0.08 
respectively. 



pansion. It shows no respect for Newton's third law, but, 
like GROMMET, is optimized for modelling single galaxies. 



4.1 Static tests 

Real galaxies have steep central density cusps (e.g., Lauer 
ct al. 1995), so an obvious test of the potential solver is to 
check the accelerations it returns for an iV-body realization 
of a truncated power-law sphere with density profile 



p(r) oc 



r a , if r < r max , 
0, otherwise. 



(11) 



I have generated a realization with r max = 1, a = 2 having 
10 7 equal-mass particles and used eq. (3) to calculate accel- 
erations at randomly selected positions inside and around 
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the sphere. For this I use a toplevel box enclosing the re- 
gion | as | < 2 together with eight levels of refinement, the 
boundary of the i th subbox being given by |x| = 2 1_ \ Fig- 
ure 3 plots the fractional difference between the results of 
this procedure against the exact, analytical expression for 
the acceleration. For radii 2~ 5 < r < 1 the RMS fractional 
error is only 0.0023, rising to 0.007 for 2~ 8 < r < 2~ 5 , within 
which there are relatively few particles. The source of this 
good and desirable behaviour is the decrease in the effective 
softening length as one moves to smaller length scales. 

For comparison, the upper set of points in figure 3 plot 
the errors in the accelerations of the same 10 7 -particle sphere 
calculated at the same positions using the tree code FALCON 
with softening kernel P2 and fixed softening length e = 10 _ . 
The RMS fractional error in the resulting accelerations for 
radii 2~ 5 < r < 1 is 0.011, over four times larger than GROM- 
met's, while for r < 2~ 5 , the calculated accelerations be- 
come systematically too low. FALCON takes about 2.5 times 
longer than GROMMET to produce these results and needs 
more than three times the memory. 

Perhaps the most worrying feature of the nested box 
scheme of section 2 is that the accelerations (3) are discon- 
tinuous at box boundaries. One can see some hints of this 
discontinuity in figure 3 at log 2 r = — 1, —2, —3, but it is 
even clearer in figure 4 which plots the fractional errors in 
a 10 8 -particle realization. Even if one were to run a simu- 
lation with such large N, the discontinuity itself is unlikely 
to be important because the integration scheme in section 3 
does not depend explicitly on the derivatives of the accelera- 
tions (but the discontinuity does mean that the integrator is 
not symplectic, as noted earlier). More important is the fact 
that if the discontinuity is noticeable it means that the bias 
in the estimates of the accelerations has become significant. 
The natural solution is then to move to a finer mesh (e.g., 
124 3 cells instead of 60 3 , figure 4). 



4.2 Dynamical tests 

For the dynamical tests I use a spherical isotropic Hernquist 
(1990) model with density profile 



p(r) = 



Ma 



2nr(a + r) 3 ' 



(12) 



This idealized model is in equilibrium. Then by Jeans' theo- 
rem (Binney & Tremaine 1987) its DF fo(x,v) can depend 
on (x, v) only through the integrals of motion, which are the 
energy £ and angular momentum J per unit mass. Since the 
model is isotropic the DF cannot depend on the latter and 
so / = /„(£). 

A straightforward procedure for generating initial con- 
ditions (hereafter ICs) corresponding to this model would 
be to draw N particles directly from fo{£), assigning each a 
mass M/N. Integrating (12), the fraction of particles inside 
radius r would then be r 2 / (a + r) 2 , showing that there would 
be relatively few particles with radii r«o, deep inside the 
interesting r _1 central density cusp. To improve resolution 
near the centre, I instead generate initial conditions using 
a multi-mass scheme, drawing particles from an anisotropic 
sampling DF (Leeuwin, Combes & Binney 1993) with num- 
ber density 




r/a 



Figure 5. Inner density profile of the same realization of a Hern- 
quist model after it has been evolved for 10 time units using a sim- 
ple leapfrog integrator with accelerations obtained using different 
potential solvers: GROMMET (light solid curve), FALCON (dotted 
curve) and the SCF method (dashed). The heavy solid curve plots 
the density profile of the the initial conditions. 



where (Sigurdsson, Hernquist & Quinlan 1995) 



l>(S, J 2 ) = Ax { I P « ) 



if r pcli < a, 
otherwise, 



(14) 



M£,J 2 )=h(£,J 2 )f (£), 



(13) 



r pe ri(£,J 2 ) is the particle's pericentre radius and the con- 
stant A is chosen to normalize f s . When the parameter 
A = 0, the sampling DF f 3 is identical to fo(£). Increas- 
ing A improves the sampling of the cusp by increasing the 
number density of particles having pericentres r por i < a. To 
balance this increase in number density each particle is as- 
signed a mass Mfo/Nf a = M/Nh{£, J 2 ) so that the phase- 
space mass density is still given by the desired fo{£) 

For the tests below I adopt units G = M — a = 1 and 
draw 2 x 10 6 particles with radii in the range 10 -3 < r < 10 2 
from the sampling DF (13) with A = 1. Poisson noise in the 
resulting distribution of particles makes it slightly asymmet- 
ric, which has two unwanted consequences (see also McMil- 
lan & Dehnen 2005). First, the centre of mass of the system 
moves with a constant velocity of order ~ 10~ 3 (GM/a) 1 ^ 2 
because the total linear momentum of the particles is small, 
but non-zero. Second, the asymmetry quickly destroys the 
inner part of the r~ l density cusp, even when viewed a frame 
co-moving with the centre of mass. To remove both of these 
effects, I extend my ICs to include the mirror distribution 
obtained by reflecting each of the 2 x 10 6 particles with 
(x,v) -> (-x,-v). The full ICs then have N = 4 x 10 6 
particles. 



4-2.1 Evolution of an (almost) equilibrium model 

Of course, one does not expect an iV-body model evolved 
from these ICs to be in perfect equilibrium; the ICs omit 
particles outside the range 10~ 3 < r < 10 2 and are con- 
structed assuming the exact potential corresponding to the 
density distribution (12) instead of the softened potential 
used in the iV-body code. Nevertheless, it is interesting to 




Figure 6. RMS fractional change in the angular momenta of particles in the models of figure 5, measured from t = to t = 10 and 
plotted as a function of the particles' pericentre radii. The same random selection of particles is used to generate each panel. 



compare the evolution of the TV-body model obtained from 
GROMMET with those obtained from the other two codes. 

Figure 5 shows the density profile of the models after 
10 time units (or ~ 66 circular orbit periods at r = 0.01). 
All three models use the same simple leapfrog integrator 
with timestep 2 x 10 , only the source of the accelera- 
tions is different. For GROMMET I use boxes with bound- 
aries at | as | = 100 x 2~ l for i = 0, . . . , 12. Each box has 
60 3 assignable cells, the cell length varying from 3.33 in the 
toplevel box down to 0.8 x 10~ 3 in the innermost box. fal- 
con's results are obtained using kernel P2 with softening 
length e = 10~ 3 , while the SCF expansion uses the Hern- 
quist & Ostriker (1992) basis function expansion truncated 
at n m ax = 6 radial and Z max = 4 angular terms. 

The results in figure 5 are unsurprising. The density at 
the very centre of the GROMMET and FALCON models falls 
slowly because because the ICs omit particles with radii r < 
10~ 3 and do not take into account the softening in these 
codes. In contrast, the density profile of the SCF model does 
not change significantly because its basis function expansion 
is incapable of producing anything that deviates strongly 
from a Hernquist model on small spatial scales. 

Much more is happening at the level of individual orbits, 
however. All of these models begin with spherical symme- 
try and remain spherical, apart from the effects of Poisson 
noise. Therefore the amount of diffusion in the angular mo- 
mentum J of their particles' orbits serves as a convenient 
measure of how far each code is from being perfectly colli- 
sionless. Figure 6 shows that the particles in all three models 
suffer from significant amounts of diffusion. The SCF model 
shows the least diffusion, but it is only marginally better 
than GROMMET; although the SCF potential remains close 
to the exact Hernquist potential, the flickering of the ex- 
pansion coefficients with time makes the orbits diffuse just 
like in any other code. The diffusion is worst in the FALCON 
model, particularly for orbits having pericentres much larger 
than its fixed softening length e = 10~ 3 . All of these results 
are based on the variation in orbits' angular momentum in 
models integrated from t = to t = 10, but I find similar 
results for models integrated from, say, t = 10 to t = 50 
when scaled to account for the longer timescale over which 
the diffusion occurs. 

The results presented so far have been obtained using an 
integrator with a single small timestep, but the dynamical 



Code Time 

FALCON 2.1 

SCF 1.3 

GROMMET 1.0 

GROMMET 0.3 

GROMMET 0.16 



'max , f-max 



Comment 

single timestep 
single timestep, (n n 
single timestep 

four levels of timestep refinement 
seven levels of timestep refinement 



imax) = (6, 4) 



Table 1. Comparison of time required for different codes to in- 
tegrate the multi-mass Hernquist model of section 4.2, relative 
to the single-timestep implementation of GROMMET. Neither the 
GROMMET nor the SCF models take advantage of the reflection 
symmetry of this simple problem. 



time inside the cusp of a Hernquist model varies with radius 
r approximately as r 1//2 . As, e.g., Zemp et al. (2007) have 
argued, it is natural to advance particles using a timestep 
proportional to the local dynamical time. We can come close 
to the optimal r oc r 1 * 12 scaling by using the block-timestep 
integrator of section 3.2 above and halving the timestep on 
every second subbox. To test the practicality of this scheme, 

1 have run a model with timesteps r = 32 x 10 -3 for par- 
ticles with I a; I > 100 x 2 ~ 1.5, shrinking by a factor of 
two at the boundaries |x| = 100 x 2~ l of boxes i = 6, 8, 
10 and 12. In the innermost (i — 12) box the timestep is 

2 x 10~ 3 , the same used for the single-timestep run above. 
This multiple-timestep model yields results almost indistin- 
guishable from the single-timestep GROMMET model plotted 
in figures 5 and 6, but is three times faster (see table 1). If it 
were appropriate to halve the timestep at all box boundaries 
i = 6, . . . , 12 (see below for an example) then the block- 
timestep scheme would yield a sixfold increase in speed over 
the single-timestep integrator. 



4-2.2 Response to an adiabatically grown blob 

For a slightly more interesting test, I model the growth of 
a black hole at the centre of a galaxy by slowly adding a 
Plummer sphere potential 



$b(x;i) 



GM h {t) 
Vr 2 + b' 2 



(15) 



to a multi-mass Hernquist model. The scale radius of the 
Plummer sphere b — 2 x 10 -3 and its mass grows with time 
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Figure 7. The results of adiabatically adding a Plummcr sphere 
potential to an initially isotropic Hcrnquist model. The Plummcr 
sphere has radius 2 X 10 _3 a and a final mass 2 X 10~ 3 Af ga i. The 
results obtained using GROMMET 's multiple-timestep scheme are 
almost identical to those calculated from Young's (1980) method. 



as (Sigurdsson, Hernquist & Quinlan 1995) 



M b (t) = M f x 



(16) 



otherwise, 



its final mass Mf = 2 x 10 being reached in a time t g — 5. 

A safe, formal way of including the effects of this exter- 
nal potential in GROMMET is to add an extra term 



E 



k- 



t 



"To 



(17) 



to the Hamiltonian (10). Integrating the resulting equations 
of motion then leads to the modifications needed in the 
block-timestep algorithm (section 3.2). In this case the nec- 
essary modifications are obvious, but for more realistic situ- 
ations (e.g., if the mass of the external source did not change 
in time and if the location of its centre were not pinned to 
x = 0) then it is helpful to start from (10) to ensure that 
the perturbation is turned on at the appropriate times and 
momentum conserved. 

As above, I use a nested series of boxes with boundaries 
at \x\ = 100 x 2~ ! with i = 0, . . . , 12, each box covered by 
a 60 3 mesh. Boxes to 5 share a common timestep to = 
5 x 10~ 3 . This is refined in every subsequent box, so that 
the timestep associated with box i ^ 6 is 2 5 ~ ! tq and the 
innermost box (\x\ < 0.024) has timestep ~4x 10~ 5 . 

My initial conditions consist of 10 6 particles drawn from 
the sampling DF (13) above. The artificially imposed poten- 
tial at x = means that this simulation only makes sense 
if the particles' centre of mass is also at x = 0. As an alter- 
native to symmetrizing the ICs as before, I instead modify 
step (i) of the PM method (section 2) to reflect the parti- 
cle distribution through each of the planes x = 0, y = 0, 
z = when assigning mass to meshes. This increases the ef- 
fective N used for the potential by a factor of 8 at little cost. 
The density profile of the final model is plotted in figure 7. 



It agrees well with the predictions obtained using Young's 
(1980) method. 



5 SUMMARY 

I have described GROMMET, a fast, economical particle- 
multiple-mesh JV-body code designed specifically for mod- 
elling the dynamical evolution of individual galaxies. In 
other words, it is designed to tackle almost exactly the same 
type of problem to which the SCF method (Hernquist & Os- 
triker 1992) is applied. Indeed, GROMMET can - loosely - be 
thought of as a variant of the SCF method using a Cartesian 
basis function expansion with millions of expansion coeffi- 
cients (the density at each mesh vertex in each of the nested 
boxes). Any application of the SCF method requires that 
one make a careful choice of the basis functions used to rep- 
resent the density and potential. Similarly, in GROMMET one 
has to choose, by hand, the set of nested boxes to use. 

For a realistic model galaxy with N > 10 6 , the single- 
timestep incarnation of GROMMET is comparable in speed to 
an SCF code using a low-order basis expansion and shows 
comparable amounts of relaxation. For most applications, 
however, GROMMET will be much faster: its nested-box po- 
tential solver admits an efficient natural block-timestep in- 
tegrator (section 3.2), leading to an approximate three- to 
six-fold increase in speed for realistic galaxy models; the 
SCF method typically requires a fairly high-order expan- 
sion to produce (reasonably) unbiased results (e.g., Holley- 
Bockelmann, Weinberg & Katz 2005), which makes it much 
slower in practice. But perhaps the main advantage of 
GROMMET over SCF methods based on spherical harmonic 
expansions is that it respects Newton's third law and is 
therefore suitable for use in studying I — 1 perturbations 
without fear of artefacts due to centring. 

To my knowledge, the tree code FALCON (Dehnen 2002) 
is the only other code that can model realistically inhomoge- 
neous galaxies without breaking the third law. For N > 10 6 
GROMmet's potential solver is more than twice as fast as 
falcon's and much less memory hungry. This efficiency 
comes at a cost though, since GROMmet's nested-box scheme 
is optimized for modelling perturbations of single galax- 
ies. It would be interesting to see whether the potential- 
splitting scheme used here (section 3.2; Duncan, Levison & 
Lee (1998)) works as well for FALCON, or indeed any other 
code that respects the third law, as it does for GROMMET. 



ACKNOWLEDGMENTS 

I thank James Binney, Walter Dehnen and Ben Moore for 
helpful discussions, and the Royal Society for financial sup- 
port. 



REFERENCES 

Anninos P., Norman M. L., Clarke D. A., 1994, ApJ, 436, 
11 

Binney J., Tremaine S., 1987, Galactic Dynamics, Prince- 
ton Univ. Press, Princeton, NJ 
Dehnen W., 2002, J. Comput. Phys., 179, 27 



8 S. J. Magorrian 



Duncan M. J., Levison H. F., Lee M. H., 1998, AJ, 116, 
2067 

Gelato S., Chernoff D. F., Wasserman I., 1997, ApJ, 480, 
115 

Hernquist L., 1990, ApJ, 356, 359 
Hernquist L., Ostriker J. P., 1992, ApJ, 386, 375 
Hernquist L., Weinberg M. D., 1989, MNRAS, 238, 407 
Hockney R. W., Eastwood J. W., Computer simulation us- 
ing particles, 1988, IoP 
Holley-Bockelmann K., Weinberg M., Katz N., 2005, MN- 
RAS, 363, 991 
James R. A., 1977, J. Comp. Phys., 25, 71 
Jessop C, Duncan M., Chau, W. Y., 1994, J. Comput. 

Phys., 115, 339 
Knebe A., Green A., Binney J., 2001, MNRAS, 325, 845 
Kravtsov A. V., Klypin A. A., Khokhlov A. M., 1997, ApJS, 
111, 73 

Lauer T. R., et al., 1995, AJ, 110, 2622 
Leeuwin F., Combes F., Binney J., 1993, MNRAS, 262, 
1013 

McMillan P. J., Dehnen W., 2005, MNRAS, 363, 1205 
Sana P., Tremaine S., 1992, AJ, 104, 1633 
Sigurdsson S., Hernquist L., Quinlan G. D., 1995, ApJ, 446, 
75 

Velazquez H., White S. D. M., 1999, MNRAS, 304, 254 
Weinberg M. D., 1989, MNRAS, 239, 549 
Wisdom J., Holman M., 1991, AJ, 102, 1528 
Young P., 1980, ApJ, 242, 1232 

Zemp M., Stadel J., Moore B., Carollo C. M., 2007, MN- 
RAS, 376, 273 



APPENDIX A: JAMES' METHOD 

James (1977) describes an economical method for calculat- 
ing the solution to Poisson's equation, 



V 2 $ = -g, 



(Al) 



discretized on a regular mesh and with a spatially bounded 
source distribution q(x), so that d^/dr — > as r — > co. 
It is easiest to explain his method for the electrostatic case 
in which q is electric charge density and $ is electrostatic 
potential. The method then consists ol the following steps: 

(i) enclose the charge distribution q inside an earthed 
metal box and calculate the potential 4>{x) inside the box 
subject to the boundary condition (f> — on the box surface; 

(ii) use Gauss' law to find the charge distribution Q in- 
duced on the surface of the box; 

(iii) calculate the potential ip(x) due to this surface 
charge distribution Q. 

The solution to (Al) for the isolated charge distribution q 
is given by $ = <j> — iji. Since this procedure is at the heart 
of GROMMET's potential solver, I explain it in some detail 
below. 



Al Preliminaries 

Throughout the following, I assume that the distribution 
q(x) has been discretized onto a cubical mesh with vertices 



at positions Xijk = (i,j,k), < k ^ n, spaced unit dis- 
tance apart. Our goal is to calculate the discretized potential 
&ijk corresponding to this qtj k . 

A straightforward way of doing this is to use the Fourier 
convolution theorem. Consider first the situation in which 
the charge distribution qj is one-dimensional with < j < 
2n; the reason for extending the mesh from n + 1 to 2n ver- 
tices will become apparent shortly. The discretized potential 
is given by the convolution 



Gtqk- 



(A2) 



where Gk is the contribution to $ fe made by a unit-charge 
particle located at xo, and we take qk = for k < or 
k ^ 2n since we have an isolated charge distribution. 

This last condition on q is awkward. Suppose instead 
that both that Gk and q k were 2n-periodic, with g_ fe = 
q2n-k, and let us impose the sensible condition that Gk is 
even with Gk = G-z-n-k and that Go = 0. Then <&j could be 
obtained economically using Fourier methods. The Fourier 
transform of qi is given by 



2n-l 

H ex P 

3=0 



(A3) 



where i = \J— 1, and similarly for G a . Using the discrete 
convolution theorem, equation (A2) becomes simply 

= G a q a . (A4) 

Applying the inverse transform, the potential is given by 

, 2n-l 



- E ^exp 



a = 



(A5) 



The periodicity needed for application of the discrete con- 
volution theorem is a nuisance, but if we allow qi to be 
non-zero only for i ^ n, then the $o • • • $n obtained 
from equ. (A2) are unaffected by it. Therefore, we can use 
this Fourier method to obtain the potential <E>; correspond- 
ing to an isolated source distribution g< (0 < i < n) pro- 
vided we extend the mesh by a further n — 1 points with 
q-n+i = • • • = qin-\ = 0. Thanks to the existence of fast 
methods for evaluating the transforms (A3) and (A5), the 
Fourier method requires only O(nlogn) operations to cal- 
culate the full set of instead of the 0(n 2 ) involved in a 
direct evaluation of equ. (A2). The savings are much more 
dramatic for the three-dimensional case, for which the direct 
sum takes 0(n 6 ) operations, compared to only C((nlogn) 3 ) 
for the Fourier method. 

James' method makes use of an alternative view of this 
"doubling up" procedure. The Fourier transform (A3) can 
be written as 



q a = 2[q a (C)+iq a (S)], 

where the cosine and sine transforms 

n 

a,^\ Kid 

j=0 
n-l 

z — ' n 



(A6) 

(A7) 
(A8) 
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come from the even and odd parts of qj, 

g J ± = ^(*±g 2 n- J ), (A9) 

respectively. The coefficients ci = • • • = c„_i = 1, but 
co = c n = | to account for the fact that qo and q n are 
counted only half as often as the other g^. Conversely, hav- 
ing both q a (C) and q a (S) we can reconstruct the original 
qj by substituting (A6) into the expression for the inverse 
transform (A5) to obtain 



q 3 =qAC)+q 3 (S), 
where 

<lAC) = lY.c aq a {C) cos 



a=0 
n-1 



ty j a 
n 



(A10) 



(All) 



(A12) 



are the inverse cosine and sine transforms of q a (C) and 
q a (S) respectively. Thus, apart from a factor of (2/n), the 
cosine and sine transforms are their own inverses. 

Now suppose that only go • • • q n are allowed to be non- 
zero. Then qf = q~ = q t , except for the unused g,7 = 
q~ — 0. Replacing q by $ in equs. (A10-A12) and taking 
& a from (A4), the potential can be written as 



*i = *i(C)+^(5), 
where 

$j(C) = - ^c a G a q a {C) cos 



a = 
n-1 



■Kja 



^( 5 ) = ^E G ^ a (S)sin^, 



(A13) 



(A14) 



(A15) 



and G a = 2G a (C) with no contribution from the sine trans- 
form of the even function Gi. 

The generalization to three dimensions is straightfor- 
ward. The Fourier transform in each direction splits into a 
sum of cosine and sine terms, yielding a total of eight terms: 

$ !jfe = $ !jfe (CCC) + $ ijk (CCS) + $ ijk (CSC) 

+ $ ijk (CSS) + <S> ljk (SCC) + $ ijk (SCS) 

+ $ ijk {SSC) + $ ijk (SSS), (A16) 

where, for example, 

n n — In — 1 

$ ijk (CSS) = ^EEE Ci* afh, (CSS)x (A17) 

i=0 j = l k = l 

nia . nj/3 . Tik-y 

cos sm sm , 

n n n 

with $ a ?-<(CSS) = G a ^q a01 (CSS) and 



n n — 1 n — 1 



q**{CSS) = J2 E E c ^ cos sin sin 

i = j = l k = l 



ma . -kjB . nicy 
— sm sin . 

n n n 



(A18) 



Notice that this decomposition into sine and cosine trans- 
forms results in two transforms for each of eight n 3 meshes. 
It requires less memory than the equivalent single (2n) 3 zero- 
padded mesh used in the "doubling-up" procedure, but for 



general qij k and Gij k it offers no improvement in speed. It 
simplifies enormously, however, for the special case in which 
Gij k is the Green's function of the discretized Laplacian ap- 
pearing in equ. (Al). James' method exploits these simpli- 
fications, particularly in dealing with the hollow induced 
surface charge distribution Q (see section A4 below). 



A2 The potential of a charge distribution inside 
an earthed box 

With this background in hand, let us turn to the details of 
James' method. Poisson's equation (Al) can be written 



(A$) i:7 - fc = -q ijk 



(A19) 



where the first-order approximation to the Laplacian oper- 
ator 

(A$) i3fc EE $ i+ l,j,fc + + *i,j+l,fc + + 

+*»,j,fc+i + *»,j,fc-i - 6$ij k - 

(A20) 

The potential cj>ij k of the earthed box is given by the solution 
of this equation subject to the boundary condition <f>ojk = 
<t>njk = 4>i0 k = ^ink = 0ijO = <t>ijn = 0. Applying the triple 
sine transform, we have that 

<f> aM {SSS) = q apl {SSS)/C a ^ (A21) 
where 

^ = 2 v 1 - C ° S V) + 2 i 1 C ° S + 2 - C ° S ?) ' 

(A22) 

Although we could immediately apply the inverse trans- 
form (A16) to obtain cf>ij k explicitly, it turns out that this 
is unnecessary and it is more efficient to use the method 
of equivalent charges (see below) to modify (p al3 ^(SSS) to 
include the effects of the potential tp corresponding to the 
induced surface charge distribution Q, saving everything for 
a single inverse transform at the very end of the calculation. 

A3 The charge distribution induced on the faces 
of the box 

The charge distribution induced on the i = face of the box 
is given by 

Qoj k = — (A0)ojfc = — (f>ij k , (A23) 

the last equality following from the fact that <f> is zero both 
on the box surface (i = 0) and outside the box (i = — 1). 
Similarly, the charge distribution induced on the opposite 
i = n face is Q n jk = <Pn-i,jk- Notice that Q vanishes along 
the edges of the box, and so is completely specified by its 
double sine transform on each of the six faces. In terms of 
4> al3l {SSS) these can be written 



Qp{SS) = - E <p afht (SSS) sin '■ 



a = l 

n-1 



Q^(SS) = -^T(-l) a <t> al3 ' l (SSS)sm—, 

71 ^ « 



(A24) 



(A25) 



and similarly for the other four faces. We can invert each of 
these to obtain Qojk etc and, from these, any of the other 
three transforms Q O 0J (SC), Q ^(CC), Q ^'{CS). 
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A4 The potential of the induced charge 
distribution 

Equation (A16) provides a way of obtaining the poten- 
tial Vijfc corresponding to this induced charge distribu- 
tion Q. The result is a sum of eight terms, all of which 
can be treated in the same way. For example, consider 
the term ^ k (CSS). Its Fourier transform ^''(CSS) = 
G a0 -<Q a0 -<(CSS), where, using (A18) and the hollowness 
of Q, 

Q a ^{CSS) = Q,f 7 (S5) + (-l) a Q^{SS). (A26) 

The other terms can be written in a similar way, although 
the SSS term vanishes. The G Q ^ 7 used here should be the 
triple cosine transform of the Green's function for the first- 
order Laplacian (A20). This need be calculated just once 
(e.g., using the doubling-up procedure), the necessary ele- 
ments being stored for subsequent use. 

It would be possible to use eq. (A16) to obtain i/v, fc 
directly, but it turns out (see below) that this labour is un- 
necessary and that it suffices to use (A16) to obtain only 
the face potentials tpojk, i'njk etc. Nevertheless, adding up 
all the contributions to each of these turns out to be the 
main computational burden of James' method. 

A5 The method of equivalent charges 

Instead of synthesizing ipijk explicitly, let us introduce an- 
other potential ipljk which is zero on the box faces but ev- 
erywhere else is equal to ipijk ■ Because it vanishes at the box 
boundaries, this new potential is completely specified by its 
triple sine transform. The "equivalent charge" distribution 
Eij k that generates it can be found using Poisson's equation 

V 2 [i> - ^ E) ] = -{Q-E], (A27) 

where Q is non-zero only on the faces of the box. For the 
first-order discretized Laplacian (equ. A20) E is confined to 
the planes i = 1, J = 1, fc = 1, i = n — 1, j = n — 1 and 
k = n-l, with, for example, Ef J (SS) = ^(SS). 

The triple sine transform of the full potential is then 

$ af,y (SSS) = <p a01 (SSS) - E aP '«(SSS)/C al31 (A28) 

the second term giving the contribution of ip^ E \ Apply- 
ing the inverse triple-sine transform to (A28) gives &ijk for 
1 ^ k < n. Finally, the missing face potentials can be 
inserted using the results obtained in section A4. 

A6 Performance 

My implementation of this procedure uses the fast sine and 
cosine transforms written by Takuya Ooura. 2 The triple- 
sine transforms involved in going from q ijk to cj> a0 ~>(SSS) 
(eq. A21) and from 3> a0 ~>(SSS) to $ ijfe (eq. A28) then take 
only ~ 30% of the total time needed to go from q^k to &ijk, 
with the evaluation of the various transforms of the face 
charge distributions (Sec. A3) accounting for a further 10%. 
The remaining 60% of the time is spent simply accumulating 
the various contributions to the face potentials (sec. A4). 
Nevertheless, for typical 65 3 or 129 3 meshes I find that my 



implementation of James' method is about 60 to 70% faster 
than the usual doubling-up procedure. 

I have focused here on describing James' method using 
the first-order approximation of the Laplacian (A20). James 
(1977) shows how it is possible to apply the same ideas to 
higher-order approximations, albeit at the expense of much 
more involved book-keeping. I find that the resulting mi- 
nor changes in the Green's function Gijk have no detectable 
effect for the realistic large- N situations described in sec- 
tion 4. 



2 http : //kurims . kyoto-u . ac . jp/" ooura/ f f t . html 



